A probabilistic approach to determining biological structure: integrating uncertain data sources

نویسنده

  • Russ B. Altman
چکیده

Modeling the structure of biological molecules is critical for understanding how these structures perform their function, and for designing compounds to modify or enhance this function (for medicinal or industrial purposes). The determination of molecular structure involves defining three-dimensional positions for each of the constituent atoms using a variety of experimental, theoretical and empirical data sources. Unfortunately, each of these data sources can be noisy or not available in sufficient abundance to determine the precise position of each atom. Instead, some atomic positions are precisely defined by the data, and others are poorly defined. An understanding of structural uncertainty is critical for properly interpreting structural models. We have developed a Bayesian approach for determining the coordinates of atoms in a three-dimensional space. Our algorithm takes as input a set of probabilistic constraints on the coordinates of the atoms, and an a priori distribution for each atom location. The output is a maximum a posteriori (MAP) estimate of the location of each atom. We introduce constraints as updates to the prior distributions. In this paper, we describe the algorithm and show its performance on three data sets. The first data set is synthetic and illustrates the convergence properties of the method. The other data sets comprise real biological data for a protein (the trp repressor molecule) and a nucleic acid (the transfer RNA fold). Finally, we describe how we have begun to extend the algorithm to make it suitable for non-Gaussian constraints. The determination of molecular structure is critical for many pursuits in biomedicine and industry, including the study of how molecules perform their function and the design of drugs to remove, modify or enhance this function. It is estimated that there are about 100,000 different proteins in the human body, but only a few hundred structures are known and stored in the protein structural data bank 1977). As the human genome project produces large amounts of information about the atomic makeup of individual molecules, it becomes critical to devise methods for estimating molecular structure—that is, for determining how the atoms within molecules arrange themselves in order to form three-dimensional structures. Biological macromolecules can be divided into proteins and nucleic acids (Stryer, 1988). Nucleic acids, such as DNA and RNA, encode the genetic blueprints for all living organisms as a linear sequence of four chemical building blocks. Although the structure of nucleic acids was once thought to be uniform and geared …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Uncertain Groupings: Probabilistic Combination of Grouping Data

Probabilistic approaches for data integration have much potential [7]. We view data integration as an iterative process where data understanding gradually increases as the data scientist continuously refines his view on how to deal with learned intricacies like data conflicts. This paper presents a probabilistic approach for integrating data on groupings. We focus on a bio-informatics use case ...

متن کامل

On Uncertain Probabilistic Data Modeling

Uncertainty in data is caused by various reasons including data itself, data mapping, and data policy. For data itself, data are uncertain because of various reasons. For example, data from a sensor network, Internet of Things or Radio Frequency Identification is often inaccurate and uncertain because of devices or environmental factors. For data mapping, integrated data from various heterogono...

متن کامل

Uncertain Data Integration Using Functional Dependencies

Data integration systems are crucial for applications that need to provide a uniform interface to a set of autonomous and heterogeneous data sources. However, setting up a full data integration system for many application contexts, e.g. web and scientific data management, requires significant human effort which prevents it from being really scalable. In this paper, we propose IFD (Integration b...

متن کامل

Uncertain Data Integration with Probabilities

Real world applications that deal with information extraction, such as business intelligence software or sensor data management, must often process data provided with varying degrees of uncertainty. Uncertainty can result from multiple or inconsistent sources, as well as approximate schema mappings. Modeling, managing and integrating uncertain data from multiple sources has been an active area ...

متن کامل

The Effect of Tuned Mass Damper on Seismic Response of Building Frames with Uncertain Structural Characteristics

Tuned mass dampers (TMDs) could be used to absorb the input energy of the applied load, and reduce the response of the building frames. However, the effectiveness of TMD in reducing the response of the building frames could be affected by inherited uncertainties in the structural characteristics of the frames. In this study, in order to investigate the probabilistic response of steel moment-res...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Int. J. Hum.-Comput. Stud.

دوره 42  شماره 

صفحات  -

تاریخ انتشار 1995